Skip to content

fix: Fix overflow for vertex/edge insersion#8

Open
zhanglei1949 wants to merge 33 commits intomainfrom
fix-overflow
Open

fix: Fix overflow for vertex/edge insersion#8
zhanglei1949 wants to merge 33 commits intomainfrom
fix-overflow

Conversation

@zhanglei1949
Copy link
Collaborator

@zhanglei1949 zhanglei1949 commented Mar 9, 2026

Fix #7

Related Greptile view at https://github.com/GraphScope/neug/pull/1725

Greptile Summary

This PR fixes overflow bugs for vertex and edge insertions by introducing a centralized calculate_new_capacity growth policy (growth.h), explicit EnsureCapacity methods on VertexTable, EdgeTable, and PropertyGraph, and a statistics file (capacity + size) persisted at checkpoint time so the allocated capacity survives a dump/reopen cycle. It also removes the previous pattern of unconditionally resizing tables on every insertion and replaces it with amortized pre-allocation.

Key changes:

  • New growth.h: overflow-safe growth helpers (25% for vertex tables, ~20% for edge tables), replacing ad-hoc doublings scattered across the code.
  • EdgeTable: added capacity_ atomic, EnsureCapacity, Size, and Capacity; statistics file written to checkpoint and validated on open.
  • VertexTable / PropertyGraph: EnsureCapacity overloads replace the previous hard-coded Reserve(Capacity * 2) calls.
  • id_indexer.h: removed automatic 25% over-allocation on open(), capacity management now entirely caller-driven.
  • mmap_array<string_view>: compaction no longer shrinks the in-memory buffer (enabling in-place appends); ftruncate in stream_compact_and_dump preserves the full buffer size in the output file; pos_ is now persisted as a separate .pos file rather than recomputed from data_size().
  • column.cc: fixed missing return in TypedEmptyColumn::get_view.

Issues found:

  • update_transaction.cc AddEdge: the WAL entry is serialized and op_num_ incremented before the EnsureCapacity check, unlike AddVertex which was corrected to check first. A failing EnsureCapacity leaves a dangling WAL entry.
  • column.h init_pos: when the .pos file is absent, pos_ is initialized to 0 instead of buffer_.data_size(), causing new string inserts to overwrite existing data in any column opened from a checkpoint that predates the .pos file.
  • mmap_array.h avg_size(): iterates all items on every call; used inside the resize path in TypedColumn<string_view>::resize, making capacity growth O(n) in the number of existing strings.

Confidence Score: 3/5

  • The PR contains two correctness issues — WAL/edge ordering and pos_ fallback — that should be addressed before merging.
  • The core capacity-growth logic is sound and well-tested. However, the WAL serialization ordering bug in AddEdge and the pos_=0 fallback in init_pos are genuine correctness issues that could cause data inconsistency under error conditions or when a string column is opened from a checkpoint created before this PR.
  • src/transaction/update_transaction.cc (WAL ordering for AddEdge) and include/neug/utils/property/column.h (pos_ fallback in init_pos)

Important Files Changed

Filename Overview
src/transaction/update_transaction.cc AddVertex correctly moved EnsureCapacity before WAL serialization, but AddEdge still serializes the WAL entry before EnsureCapacity, creating an inconsistency that can leave a dangling WAL entry if capacity growth fails.
include/neug/utils/mmap_array.h Removed data_.resize() from compact() to preserve buffer capacity; added ftruncate in stream_compact_and_dump to extend the output file to the original buffer size (intentional design for in-place appends). New avg_size() is O(n) and is called on every resize, which is a performance concern. Also added unistd.h include for ftruncate.
include/neug/utils/property/column.h Replaced inline pos_=buffer_.data_size() with init_pos() that reads pos_ from a persistent .pos file; when the file is absent, pos_ is set to 0 instead of buffer_.data_size(), which can cause new string inserts to overwrite existing data.
src/storages/graph/edge_table.cc Added capacity_ atomic and EnsureCapacity/Size/Capacity methods; pre-allocates edge table capacity before batch insertions instead of resizing after each insert; adds statistics file (capacity + size) to checkpoint; removed table_->resize() from batch_add_unbundled_edges_impl since capacity is pre-allocated by the caller.
src/storages/graph/vertex_table.cc Added EnsureCapacity() that delegates to Reserve(); simplified Open() signature by removing build_empty_graph flag; fixed Reserve() to guard table_->resize() with a size check to prevent spurious shrinks; moved v_ts_.Reserve() outside the table_ null check.
src/storages/graph/property_graph.cc Added EnsureCapacity() overloads for vertices and edges; Dump() now pre-allocates capacity via EnsureCapacity before dumping and persists vertex_capacity for CSR resize; Open() refactored to eliminate code duplication; Reserve() fixed for self-loop edge labels (v_label==dst_label guard added).
include/neug/utils/growth.h New file centralising capacity growth logic: vertex tables grow ~25% and edge tables ~20%; includes overflow guards using numeric_limits; semantics of is_vertex_table bool are clear in context but the naming could be improved.
src/utils/file_utils.cc Moved read_file/write_file from mutable_csr.cc to a shared utility; added write_statistic_file/read_statistic_file helpers; write_file now throws on fwrite failure (fixes previous silent-swallow bug noted in prior review).
src/storages/graph/graph_interface.cc AddVertex and AddEdge now call EnsureCapacity before inserting; AddVertex correctly removed the hard-coded Reserve(Capacity*2) fallback; EnsureCapacity failures are propagated as false return values.
include/neug/utils/id_indexer.h Removed automatic 25% over-allocation on all four open() variants and the redundant keys_->resize() in dump(); capacity management now delegated entirely to callers through EnsureCapacity.
src/utils/property/column.cc Fixed a trivial bug in TypedEmptyColumn::get_view: added missing return statement so it no longer falls off the end of a non-void function.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant UpdateTransaction
    participant PropertyGraph
    participant VertexTable
    participant EdgeTable

    Note over Caller,EdgeTable: AddVertex flow (corrected in this PR)
    Caller->>UpdateTransaction: AddVertex(label, oid, props)
    UpdateTransaction->>PropertyGraph: EnsureCapacity(label)
    PropertyGraph->>VertexTable: EnsureCapacity(capacity)
    VertexTable->>VertexTable: Reserve(capacity)
    VertexTable-->>PropertyGraph: new_capacity
    PropertyGraph-->>UpdateTransaction: Status::OK
    UpdateTransaction->>UpdateTransaction: InsertVertexRedo::Serialize + op_num_++
    UpdateTransaction->>PropertyGraph: AddVertex(...)

    Note over Caller,EdgeTable: AddEdge flow (ordering issue remains)
    Caller->>UpdateTransaction: AddEdge(src, dst, edge_label, props)
    UpdateTransaction->>UpdateTransaction: InsertEdgeRedo::Serialize + op_num_++ ⚠️
    UpdateTransaction->>PropertyGraph: EnsureCapacity(src, dst, edge_label)
    PropertyGraph->>EdgeTable: EnsureCapacity(capacity)
    EdgeTable->>EdgeTable: table_->resize(capacity)
    EdgeTable-->>PropertyGraph: ok
    PropertyGraph-->>UpdateTransaction: Status::OK
    UpdateTransaction->>PropertyGraph: AddEdge(...)

    Note over Caller,EdgeTable: BatchAddVertices flow
    Caller->>PropertyGraph: BatchAddVertices(v_label, supplier)
    PropertyGraph->>VertexTable: insert_vertices(supplier)
    loop per batch
        VertexTable->>VertexTable: EnsureCapacity(calculate_new_capacity(new_size))
        VertexTable->>VertexTable: insert_primary_keys + set_properties
    end
Loading

Comments Outside Diff (6)

  1. src/transaction/update_transaction.cc, line 738-746 (link)

    Redo log serialized before capacity check — inconsistent transaction state on failure

    InsertEdgeRedo::Serialize and op_num_ += 1 are executed on lines 738–741 before EnsureCapacity is checked on line 742. If EnsureCapacity returns an error (e.g., the edge label triplet doesn't exist), the function returns false but the redo log (arc_) already contains a serialized edge entry and op_num_ has been incremented. This leaves the transaction's redo log in an inconsistent state and could cause incorrect replay during crash recovery.

    This is directly inconsistent with the pattern used in the AddVertex function in this same file (lines 683–691), which correctly calls EnsureCapacity before calling InsertVertexRedo::Serialize and incrementing op_num_.

    The fix is to move the EnsureCapacity call (and early return on failure) before the serialize/increment:

  2. src/transaction/update_transaction.cc, line 738-746 (link)

    Redo log serialized before capacity check — transaction log corruption on failure

    InsertEdgeRedo::Serialize (line 738) and op_num_ += 1 (line 741) execute before EnsureCapacity is checked. If EnsureCapacity returns an error and the function returns false, the redo log entry has already been appended and op_num_ incremented, but the edge was never actually inserted. This leaves the transaction log in an inconsistent state and could cause incorrect replay during recovery.

    This is directly inconsistent with the fixed AddVertex path in the same file (lines 683–692), where serialization is correctly placed after the capacity check:

    // AddVertex (correct ordering):
    auto status = graph_.EnsureCapacity(label);
    if (!status.ok()) { return false; }
    InsertVertexRedo::Serialize(arc_, label, oid, props);
    op_num_ += 1;
    status = graph_.AddVertex(...);

    The AddEdge block should follow the same pattern:

    auto status = graph_.EnsureCapacity(src_label, dst_label, edge_label);
    if (!status.ok()) {
      LOG(ERROR) << "Failed to ensure space before insert edge: "
                 << status.ToString();
      return false;
    }
    InsertEdgeRedo::Serialize(arc_, src_label, GetVertexId(src_label, src_lid),
                              dst_label, GetVertexId(dst_label, dst_lid),
                              edge_label, properties);
    op_num_ += 1;
    auto oe_offset = graph_.AddEdge(...);
  3. src/transaction/update_transaction.cc, line 738-746 (link)

    Redo log serialized before capacity check

    InsertEdgeRedo::Serialize (line 738) and op_num_ += 1 (line 741) are called before EnsureCapacity (line 742). If EnsureCapacity fails and the function returns false, the WAL already contains the edge-insert entry and op_num_ has been incremented — but the edge was never actually written to the graph.

    On WAL replay after a crash at this point, the replayed edge insertion would fail again (or worse, succeed spuriously with uninitialised capacity), corrupting the graph state.

    The AddVertex path (line 683) does this correctly: it calls EnsureCapacity first, and only then serializes the redo entry. AddEdge should follow the same ordering:

    // Correct order — mirror AddVertex:
    auto status = graph_.EnsureCapacity(src_label, dst_label, edge_label);
    if (!status.ok()) {
      LOG(ERROR) << "Failed to ensure space before insert edge: " << status.ToString();
      return false;
    }
    InsertEdgeRedo::Serialize(arc_, src_label, GetVertexId(src_label, src_lid),
                              dst_label, GetVertexId(dst_label, dst_lid),
                              edge_label, properties);
    op_num_ += 1;
  4. src/storages/graph/edge_table.cc, line 263-264 (link)

    reserve reserves 0 instead of the intended size

    edge_data is freshly default-constructed, so edge_data.size() is 0. The call edge_data.reserve(edge_data.size()) is a no-op and the intent was clearly to pre-allocate property_vec.size() slots to avoid repeated reallocations during the loop below.

  5. include/neug/utils/mmap_array.h, line 365-380 (link)

    avg_size() is O(n) and is called in a resize hot path

    avg_size() iterates over every item in the buffer to compute the average string length. This is called from TypedColumn<std::string_view>::resize, which is invoked from insert_vertices_impl whenever the vertex table grows. For tables with millions of string-typed properties, this becomes an O(n) scan inside what is essentially the capacity-growth logic, adding significant overhead that scales with the number of existing entries.

    Consider tracking the running total length incrementally (e.g., as an atomic member updated on each set call) rather than recomputing it on every call:

    size_t avg_size() const {
      // total_data_length_ and non_zero_count_ could be maintained incrementally
      return non_zero_count_ > 0 ? (total_data_length_ + non_zero_count_ - 1) / non_zero_count_ : 0;
    }

    Alternatively, cap the sample to the first N entries (e.g., 1024) to bound the cost.

  6. include/neug/utils/property/column.h, line 522-527 (link)

    pos_ initialized to 0 when .pos file is absent — corrupts strings on first post-dump insert

    When the .pos file does not exist (e.g., first open from a freshly-created or migrated database before any Dump), pos_ is set to 0. If the column was opened from an existing checkpoint that already contains string data (the .items and .data files were copied from elsewhere), new string inserts will start writing at offset 0, silently overwriting all previously stored string values.

    The pre-existing behavior in all four open* paths was:

    pos_ = buffer_.data_size();

    which safely positions the write cursor at the end of existing data. The new init_pos no longer does this as a fallback.

    When the .pos file does not exist, falling back to buffer_.data_size() (the actual used extent) rather than 0 would be the safe default:

    inline void init_pos(const std::string& file_path) {
      if (std::filesystem::exists(file_path)) {
        read_file(file_path, &pos_, sizeof(pos_), 1);
      } else {
        pos_.store(buffer_.data_size());  // safe fallback: append after existing data
      }
    }

Last reviewed commit: 5d6db2c

Greptile also left 1 inline comment on this PR.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets the 4096 insert failure (issue #7) by introducing proactive capacity growth for vertex/edge storage structures so that sequential CREATE statements can continue past the default reserved space.

Changes:

  • Add EnsureCapacity APIs for vertex/edge storage and call them before inserts to avoid “reserved space exhausted” failures.
  • Introduce edge-table capacity tracking for unbundled edges and expose capacity/size helpers across storage layers.
  • Add/extend Python binding tests for high-volume vertex/edge insertions; adjust some logging/error handling paths.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
tools/python_bind/tests/test_tp_service.py Adds TP service stress tests for inserting many vertices/edges; also updates execute() call style to pass access_mode.
tools/python_bind/tests/test_db_query.py Adds embedded/local test for inserting many edges and validating count.
src/utils/property/column.cc Fixes missing return in TypedEmptyColumn<T>::get_view.
src/transaction/update_transaction.cc Calls graph_.EnsureCapacity(...) before vertex/edge inserts.
src/storages/graph/vertex_table.cc Adds VertexTable::EnsureCapacity helper.
src/storages/graph/property_graph.cc Adds PropertyGraph::EnsureCapacity for vertices/edges and improves vertex add error message.
src/storages/graph/graph_interface.cc Calls EnsureCapacity in AP update interface before vertex/edge inserts.
src/storages/graph/edge_table.cc Tracks/resizes unbundled edge property-table capacity and adds EdgeTable::EnsureCapacity.
src/storages/csr/mutable_csr.cc Adds CSR capacity() API implementation.
src/storages/csr/immutable_csr.cc Adds CSR capacity() API implementation.
src/server/neug_db_session.cc Adds exception handling around transaction commit and returns structured internal errors.
src/compiler/planner/gopt_planner.cc Downgrades compilePlan query logging from INFO to VLOG(1).
include/neug/storages/graph/vertex_table.h Declares EnsureCapacity and adds Size() accessor.
include/neug/storages/graph/property_graph.h Declares EnsureCapacity APIs (vertex and edge).
include/neug/storages/graph/edge_table.h Declares EnsureCapacity, adds size/capacity helpers and capacity tracking member.
include/neug/storages/csr/csr_base.h Adds pure virtual capacity() API for CSRs.
include/neug/storages/csr/mutable_csr.h Declares capacity() overrides for mutable CSRs.
include/neug/storages/csr/immutable_csr.h Declares capacity() overrides for immutable CSRs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@zhanglei1949
Copy link
Collaborator Author

@greptile

@zhanglei1949 zhanglei1949 force-pushed the fix-overflow branch 3 times, most recently from db68d39 to ac3638f Compare March 9, 2026 09:42
@zhanglei1949 zhanglei1949 requested review from liulx20 and lnfjpt March 10, 2026 02:23
@zhanglei1949
Copy link
Collaborator Author

@greptile

Committed-by: xiaolei.zl from Dev container

Committed-by: xiaolei.zl from Dev container

add capacity api

Committed-by: xiaolei.zl from Dev container

fix CI

Committed-by: xiaolei.zl from Dev container

remove tests for TP

Committed-by: xiaolei.zl from Dev container

Committed-by: xiaolei.zl from Dev container

fixing

Committed-by: xiaolei.zl from Dev container

Committed-by: xiaolei.zl from Dev container

Committed-by: xiaolei.zl from Dev container

fix

Update src/storages/graph/edge_table.cc

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

fixes

fix

use explicit capacity calculation for EnsureCapacity
@zhanglei1949
Copy link
Collaborator Author

@greptile

@zhanglei1949
Copy link
Collaborator Author

@greptile

zhanglei1949 and others added 3 commits March 11, 2026 15:03
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@zhanglei1949
Copy link
Collaborator Author

@greptile

zhanglei1949 and others added 3 commits March 11, 2026 18:03
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Louyk14 pushed a commit that referenced this pull request Mar 12, 2026
- Add `.clang-format`.
- Remove constraint of singleton `GraphDB`.
zhanglei1949 and others added 13 commits March 12, 2026 12:03
Committed-by: xiaolei.zl from Dev container
…acOS (#29)

* Generate -I for some third-party libs (e.g., protobuf, arrow)

* fix
* enable codecov

Committed-by: nengli.ln from Dev container

Committed-by: nengli.ln from Dev container

* add ignore for coverage

Committed-by: nengli.ln from Dev container

Committed-by: nengli.ln from Dev container
* support varchar(max_length) in NeuG type system

Committed-by: Xiaoli Zhou from Dev container

* skip specific regex when comparing physical plans

Committed-by: Xiaoli Zhou from Dev container

* minor fix according to review

Committed-by: Xiaoli Zhou from Dev container

* fix ci

Committed-by: xiaolei.zl from Dev container

Committed-by: xiaolei.zl from Dev container

* avoid change func api

Committed-by: xiaolei.zl from Dev container

* ci strange failure

Committed-by: xiaolei.zl from Dev container

---------

Co-authored-by: xiaolei.zl <xiaolei.zl@alibaba-inc.com>
* add test for inserting string in embedding mode

Committed-by: xiaolei.zl from Dev container

* format

Committed-by: xiaolei.zl from Dev container

* Update tools/python_bind/tests/test_db_query.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update tools/python_bind/tests/test_db_query.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update tools/python_bind/tests/test_db_query.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update tools/python_bind/tests/test_db_query.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* format

Committed-by: xiaolei.zl from Dev container

* fix

Committed-by: xiaolei.zl from Dev container

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@zhanglei1949
Copy link
Collaborator Author

@greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CREATE vertex fails

5 participants